Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data

نویسندگان

  • Florian Buettner
  • Victoria Moignard
  • Berthold Göttgens
  • Fabian J. Theis
چکیده

MOTIVATION High-throughput single-cell quantitative real-time polymerase chain reaction (qPCR) is a promising technique allowing for new insights in complex cellular processes. However, the PCR reaction can be detected only up to a certain detection limit, whereas failed reactions could be due to low or absent expression, and the true expression level is unknown. Because this censoring can occur for high proportions of the data, it is one of the main challenges when dealing with single-cell qPCR data. Principal component analysis (PCA) is an important tool for visualizing the structure of high-dimensional data as well as for identifying subpopulations of cells. However, to date it is not clear how to perform a PCA of censored data. We present a probabilistic approach that accounts for the censoring and evaluate it for two typical datasets containing single-cell qPCR data. RESULTS We use the Gaussian process latent variable model framework to account for censoring by introducing an appropriate noise model and allowing a different kernel for each dimension. We evaluate this new approach for two typical qPCR datasets (of mouse embryonic stem cells and blood stem/progenitor cells, respectively) by performing linear and non-linear probabilistic PCA. Taking the censoring into account results in a 2D representation of the data, which better reflects its known structure: in both datasets, our new approach results in a better separation of known cell types and is able to reveal subpopulations in one dataset that could not be resolved using standard PCA. AVAILABILITY AND IMPLEMENTATION The implementation was based on the existing Gaussian process latent variable model toolbox (https://github.com/SheffieldML/GPmat); extensions for noise models and kernels accounting for censoring are available at http://icb.helmholtz-muenchen.de/censgplvm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...

متن کامل

A new approach for data visualization problem

Data visualization is the process of transforming data, information, and knowledge into visual form, making use of humans’ natural visual capabilities which reveals relationships in data sets that are not evident from the raw data, by using mathematical techniques to reduce the number of dimensions in the data set while preserving the relevant inherent properties. In this paper, we formulated d...

متن کامل

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...

متن کامل

The Integration of Multi-Factor Model of Capital Asset Pricing and Penalty Function for Stock Return Evaluation

One of the main concerns of investors is the evaluation of the return on investment, which is conducted using various models such as the CAPM (single-factor model), Fama-French three/five-factor models, and Roy and Shijin’s six-factor model and other models known as multi-factor models. Despite the widespread use of these models, their major drawbacks include sensitivity to unexpected changes, ...

متن کامل

تاثیر عدم قطعیت در برآورد پتانسیل خطر لرزه ای منطقه

In this paper the effects of uncertainties in the estimation of seismic hazard parameters is considered. These uncertainties are the result of the intricacy of the matter, restriction in identifying the interfering factors and lack of ability in determining the effective elements. One of the methods in estimating the seismic hazard potential is probabilistic method. In this method with the hyp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2014